How Similar are Chinese and Japanese for Cross-Language Information Retrieval?
نویسنده
چکیده
For NTCIR Workshop 5 UC Berkeley participated in the bilingual task of the CLIR track. Our focus was on Chinese topic searches against the Japanese News document collection, and on Japanese topic search against the Chinese News Document Collection. Extending our work of NTCIR 4 workshop, we performed search experiments to segment and use Chinese search topics directly as if they were Japanese topics and vice versa. We also utilized a commercial Machine Translation (MT) between the two languages, with English as a pivot language. The best performance of Chinese topic search for Japanese documents was achieved using a hybrid approach which combined MT pivot translation with direct use of Chinese topic expressions.
منابع مشابه
Japanese-Chinese Cross-Language Information Retrieval: An Interlingua Apporach
Electronically available multilingual information can be divided into two major categories: (1) alphabetic language information (English-like alphabetic languages) and (2) ideographic language information (Chinese-like ideographic languages). The information available in non-English alphabetic languages as well as in ideographic languages (especially, in Japanese and Chinese) is growing at an i...
متن کاملChinese-Japanese Cross Language Information Retrieval: A Han Character Based Approach
In this paper, we investigate cross language information retrieval (CLIR) for Chinese and Japanese texts utilizing the Han characters common ideographs used in writing Chinese, Japanese and Korean (CJK) languages. The Unicode encoding scheme, which encodes the superset of Han characters, is used as a common encoding platform to deal with the mulfilingual collection in a uniform manner. We discu...
متن کاملCross-language Information Retrieval, Document Alignment and Visualization – A Study with Japanese and Chinese
With the advent of the Internet and digital libraries, as well as the proliferation of multilingual information, sophisticated methods of representation and indexing, and the retrieval of such information is essential. In recent years, the amount of electronically available information has escalated. The non-English information (information in Asian and European languages) is growing rapidly. A...
متن کاملNTCIR CLIR Experiments at the University of Maryland
This paper presents results for the Japanese/English cross-language information retrieval task on the NACSIS Test Collection. Two automatic dictionarybased query translation techniques were tried with four variants of the queries. The results indicate that longer queries outperform the required descriptiononly queries and that use of the rst translation in the dictionary is comparable with the ...
متن کاملAINLP at NTCIR-6: Evaluations for Multilingual and Cross-Lingual Information Retrieval
In this paper, a multilingual cross-lingual information retrieval (CLIR) system is presented and evaluated in NTCIR-6 project. We use the language-independent indexing technology to process the text collections of Chinese, Japanese, Korean, and English languages. Different machine translation systems are used to translate the queries for bilingual and multilingual CLIR. The experimental results...
متن کامل